Overview

These are steps I take to analyze the TilS growth curves. Some things in the parse_96wellGrowthCurves.R script are specific to the 48-hour curves I run, but I tried to make the other parts more general. So, you might need to tailor that script to match what the different plate readers give you.

The basics steps however are:

  1. Run a growth curve on either the VersaMax or Tecan Spark readers.
  2. Create a plate template that defines what samples and coniditions are in which wells.
  3. Read in the plate(s) and template(s) to R.
  4. Format and combine data frames.
  5. Calculate the average absorbance readings and plot general growth curves.
  6. Use growthrates to fit curves to each individual replicate, tailoring to what measures I’m looking for. (Extract or derive measures.)
  7. Do a statistical comparison of the measures.
  8. Plot results.

This requires some familiarity with R, but hopefully not too much, plus a little understanding of ggplot2. I strongly encourage you to:

  1. Use RStudio
  2. Create a new project folder (or use an existing folder) to contain all files related to your project. This will be your working directory.
  3. Create an R-Notebook (which can be saved in the base directory, or in a sub-folder, useful for when you have more than one notebook but in the same project).
  4. And to copy/paste, modify, and run each bit of code in separate code blocks to.

This will track exactly what lines of code your ran (which leave out what you did not), as well Document your analysis and save your results together, as if in a lab notebook. (See this resource for more in depth help with notebooks.) Using the code chunks in a notebook will also help with troubleshooting, as the pipeline will be broken up.

For general R help: R for Data Science, Google, stackoverflow, and any results from STDHA and datanovia are very helpful. Searching for the exact wording of an error, or just what you’re trying to accomplish in general can typically generate good results. I also like to use the “Environment” pane in RStudio to see what variables I have in use. Clicking on their names will also allow you to vew them in matrix format, where you can hover over column names for data type info. This is often helpful for debugging.

(Also, I am a bit of a novice to coding, so there are definitely multiple ways of accomplishing similar goals. Some much more elegant than these. But I don’t know them yet.)

Dependecies

To start, these scripts rely on packages you may need to install (download) and then load (kind of like attaching.) You may have tons of packages installed, but only need to load the ones that you will be using for a certain analysis. (You do need to keep in mind that you need to load the packages each time you open RStudio, if you stop mid analysis.)

Most packages are stored in respository that R can access easily. Typing install.packages( and then starting to type the name of the package may autofill what you want. If it doesn’t, enter the name in quotations. RStudio may ask you to pick a mirror site, typically choosing one geographically close is fastest. (If it is not available on the CRAN repositories, visit that packages website/git page. There should be installation instructions.)

These packages are what I run. Some of them are not entirely necessary, but I use in this tutorial and make things easier/nicer. I have marked them. (To not load them, comment that line of code out with Ctrl+Shift+C when on that line, or while highlighting multiple lines.)

library(rstatix)
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter
library(openxlsx)
library(janitor)
## 
## Attaching package: 'janitor'
## The following object is masked from 'package:rstatix':
## 
##     make_clean_names
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(reshape2)
library(inauguration)
library(paletteer) # Not necessary
library(scales) # Not entirely necessary
library(RColorBrewer) # Not necessary
library(emmeans)
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:rstatix':
## 
##     select
## 
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
## 
##     geyser
library(multcompView)
library(DescTools)
## Registered S3 method overwritten by 'DescTools':
##   method        from        
##   print.palette inauguration
library(growthrates)
## Loading required package: lattice
## Loading required package: deSolve
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.1     v dplyr   1.0.0
## v tidyr   1.1.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x readr::col_factor() masks scales::col_factor()
## x purrr::discard()    masks scales::discard()
## x dplyr::filter()     masks rstatix::filter(), stats::filter()
## x dplyr::lag()        masks stats::lag()
## x dplyr::select()     masks MASS::select(), rstatix::select()
library(here) # Not necessary, just what I use to make pathing things easier, and only dependent on project sub-folders.
## here() starts at C:/Users/mjf123/Documents/R/Growth_Curves_in_R
source(here("parse_96wellGrowthCurves.R")) # This is the script I wrote, just make sure to path to wherever you keep it.

You should see a bunch of red text, some of which will talk about objects being masked form packages. The order of loading packages can matter, as the last package to load that uses a shared object name, is the one that gets to use that name by default.

The last line should be a call that tells you the folder where the here package is rooting all of your paths out from. Now, whenever you use a line like here("a folder", "a file.pdf"), the function will look for “a file.pdf” in the “a folder” sub-folder of the base directory. (Typically the base directory is automatically set as the project folder that contains the “.proj” file.) In these examples, I’ve used the here package to create semi-nonspecfic directions to the file (which is useful for when your project folders move around, but the R-notebook and the files stay together. It assumes that the data structure within a project folder stays the same.) The here function also removes system-specific jargon (Windows use  instead of / in filing, unlike other OSs.)

(As a note, the here function name is used by multiple packages, so sometimes you’ll think you are using here from the here package, but you are using the function from another, say stats. If a line of code is giving you problems contains a here(), try putting in place here::here() which tells R to specifically go to the here package and use that version of the function. If we instead wanted the stats version, we would use stats::here() to override whatever R is defaulting on.)

These ar packages that are helpful in making great plots, but not necessary for this tutorial (and so have been commented out.)

# library(ggrepel) # Not necessary
# library(ggpubr)
# library(grid) # I think this is uncessary.
# library(gtable) # I think ggplot already loads this.

As a general tip: the “Environment” tab in the upper right corner of RStudio can be really nice to quickly see what stored objects you have, and what is stored in them by clicking on the blue arrow (to get a drop down summary of structure and conents) or their name (to pop open a new tab, where you can see all of the contents and hover over the column names to see data type.) This is great when trying to figure out what exactly a function (or you) did to your data. # The Data

These scripts require that you have two things for each plate:

  1. An xlsx file from the plate reader. (For xls files, just open in Excel and resave as an xlsx.)
  2. A plate template that matches each well with a sample and other specifics. (see plate_template.xlsx for a blank copy to fill-in.)

The template spread sheet must have columns named Well_ID, Strain, Condition, Replicate (within the plate), and Plate (helpful when multiple plates will be run in a series). There are also requirements for what you fill-in:

  • Empty wells should have “blank” as their sample name.
  • When running a series of plates, keep the naming (including capitalization) of strains and conditions the same between them.
  • The way the code is written (and to make your life easier later), do not use periods/dots or other special characters for strain, condition, etc. Hyphens and underscores are OK, (underscores are “traditionally” used in place of spaces.) While periods/dots are typically fine, some of the formatting tricks in my code rely on the as delimitters between items.
  • If you have multiple biological replicates and technical replicates on a plate, use a hypenated system to denote both in the Replicate column, i.e. “C-2” would be for technical replicate 2 of biolgoical replicate C for a specific sample.
  • The Plate entries should be identical the entire way down the plate, and each plate should have its own template. (In theory you could use the same template file, and then change the plate name for each data frame individually.)

I keep a clean plate_template.xlsx, and then make a new copy to fill-in for each plate. (In the process of filling them out, I resort the columns to speed up numbering replicates. BUT resorting them into the A1-A12 format is difficult for Excel.)

The specifics of how you word items can help you out later on. For example:

  • If you have multiple independent condition variables, label them with systematic combinations of conditions separated by hyphens or underscores. If I ran a plate varying both antibiotic concentrations and media, I would fill in the condition column with a consistent hyphenated combination of the two, like “0ug-LB” and “2ug-M9”. This way, these details can be automatically recovered later on using a simple separate command.
  • It can also be helpful to name the plates themselves sequentially with the experiment name rather than just with a date.
  • To save you time later in plotting, it can be useful to use spaces and capitalization as you would like to see in your labels. This is typically a terrible idea for programming. R seems to do OK with it though (as long as you use double quotes for variable values “tilS N274Y”, and single apostrophes for variable names ‘Time Elapsed’).

Reading in Data

The plate reader you use will determine which function you will call.

  • parse_growth_Spark for the Tecan Spark (coming from Tecan’s Spark Control software)
  • parse_growth_Versa_xlsx for the two VersaMax (coming from SoftMax Pro software, and saved as an xlsx)

For the Tecan Spark, if you are using timing other than reads every 10 minutes over 48 hours you, you will need to adjust the code for the parse_growth_Spark function, so that the rows it looks to read in, match the rows that you have data at.

Both functions expect:

  1. The data file with a path. (e.g. “C:Documents1.xlsx”)
    • here() Explanation
  2. The template file with a path.
  3. What format you want the output data in (options are “R”, “prism”, “python”). This mostly affects how it formats the time column.
  4. Whether or not to remove the media blanks.
    • If you inoculated with enough cells to affect the OD and you don’t wish to use this value as the baseline, make this parameter FALSE. You will want to then want to modify the format_GC function in the R script itself. See the comments in that section for help.
    • After modification, you’ll need to read save the parse_96wellGrowthCurves.R file, and re-run source(here...) line to load the changes.
  5. A folder where it will write a copy of the parsed out data as an xlsx file. It will create the folder if it does not exist already.

Here I’m loading 5 different plates, all from the same series.

R1 <- parse_growth_Spark(datawithpath = here("data", "1-10-2021_RKS-swap1.xlsx"),
                         templatewithpath = here("data", "1-10-2021_plate_template.xlsx"),
                         output = "R", remove_emptys = TRUE, outputfoldername = "parsed")
## Saving:  C:/Users/mjf123/Documents/R/Growth_Curves_in_R/parsed/1-10-2021_RKS-swap1_parsedforR.csv
C2 <- parse_growth_Spark(here("data", "1-17-2021_CSH-swap2.xlsx"),
                         here("data", "1-17-2021_plate_template.xlsx"),
                          "R", remove_emptys = TRUE, outputfoldername = "parsed")
## Saving:  C:/Users/mjf123/Documents/R/Growth_Curves_in_R/parsed/1-17-2021_CSH-swap2_parsedforR.csv
C3 <- parse_growth_Spark(here("data", "1-20-2021_CSH-swap3.xlsx"),
                         here("data", "1-20-2021_plate_template.xlsx"),
                          "R", remove_emptys = TRUE, outputfoldername = "parsed")
## Saving:  C:/Users/mjf123/Documents/R/Growth_Curves_in_R/parsed/1-20-2021_CSH-swap3_parsedforR.csv
R2 <- parse_growth_Spark(here("data", "1-22-2021_RKS-swap2.xlsx"),
                         here("data", "1-22-2021_plate_template.xlsx"),
                         "R", remove_emptys = TRUE, outputfoldername = "parsed")
## Saving:  C:/Users/mjf123/Documents/R/Growth_Curves_in_R/parsed/1-22-2021_RKS-swap2_parsedforR.csv
R3 <- parse_growth_Spark(here("data", "1-24-2021_RKS-swap3.xlsx"),
                         here("data", "1-24-2021_plate_template.xlsx"),
                         "R", remove_emptys = TRUE, outputfoldername = "parsed")
## Saving:  C:/Users/mjf123/Documents/R/Growth_Curves_in_R/parsed/1-24-2021_RKS-swap3_parsedforR.csv

These functions will give each replicate its own sample_name of the format [strain].[condition].[platename].[replicate number on plate], and then return the following data frames (as a list):

  1. The data as a data frame with each well as its own column, which can be useful to prep for other packages, and a row for each time point.
names(R1[[1]])
##  [1] "time"                             "A244T.CSH.Ca_Mg_swap_1.1"        
##  [3] "A244T.CSH.Ca_Mg_swap_1.2"         "A244T.CSH.Ca_Mg_swap_1.3"        
##  [5] "A244T.RKS.Ca_Mg_swap_1.1"         "A244T.RKS.Ca_Mg_swap_1.2"        
##  [7] "A244T.RKS.Ca_Mg_swap_1.3"         "A244T.RKS_mod.Ca_Mg_swap_1.1"    
##  [9] "A244T.RKS_mod.Ca_Mg_swap_1.2"     "A244T.RKS_mod.Ca_Mg_swap_1.3"    
## [11] "N274Y.CSH.Ca_Mg_swap_1.1"         "N274Y.CSH.Ca_Mg_swap_1.2"        
## [13] "N274Y.CSH.Ca_Mg_swap_1.3"         "N274Y.RKS.Ca_Mg_swap_1.1"        
## [15] "N274Y.RKS.Ca_Mg_swap_1.2"         "N274Y.RKS.Ca_Mg_swap_1.3"        
## [17] "N274Y.RKS_mod.Ca_Mg_swap_1.1"     "N274Y.RKS_mod.Ca_Mg_swap_1.2"    
## [19] "N274Y.RKS_mod.Ca_Mg_swap_1.3"     "N445K.CSH.Ca_Mg_swap_1.1"        
## [21] "N445K.CSH.Ca_Mg_swap_1.2"         "N445K.CSH.Ca_Mg_swap_1.3"        
## [23] "N445K.RKS.Ca_Mg_swap_1.1"         "N445K.RKS.Ca_Mg_swap_1.2"        
## [25] "N445K.RKS.Ca_Mg_swap_1.3"         "N445K.RKS_mod.Ca_Mg_swap_1.1"    
## [27] "N445K.RKS_mod.Ca_Mg_swap_1.2"     "N445K.RKS_mod.Ca_Mg_swap_1.3"    
## [29] "P421L.CSH.Ca_Mg_swap_1.1"         "P421L.CSH.Ca_Mg_swap_1.2"        
## [31] "P421L.CSH.Ca_Mg_swap_1.3"         "P421L.RKS.Ca_Mg_swap_1.1"        
## [33] "P421L.RKS.Ca_Mg_swap_1.2"         "P421L.RKS.Ca_Mg_swap_1.3"        
## [35] "P421L.RKS_mod.Ca_Mg_swap_1.1"     "P421L.RKS_mod.Ca_Mg_swap_1.2"    
## [37] "P421L.RKS_mod.Ca_Mg_swap_1.3"     "tRNA-Ile2.CSH.Ca_Mg_swap_1.1"    
## [39] "tRNA-Ile2.CSH.Ca_Mg_swap_1.2"     "tRNA-Ile2.CSH.Ca_Mg_swap_1.3"    
## [41] "tRNA-Ile2.RKS.Ca_Mg_swap_1.1"     "tRNA-Ile2.RKS.Ca_Mg_swap_1.2"    
## [43] "tRNA-Ile2.RKS.Ca_Mg_swap_1.3"     "tRNA-Ile2.RKS_mod.Ca_Mg_swap_1.1"
## [45] "tRNA-Ile2.RKS_mod.Ca_Mg_swap_1.2" "tRNA-Ile2.RKS_mod.Ca_Mg_swap_1.3"
## [47] "WT.CSH.Ca_Mg_swap_1.1"            "WT.CSH.Ca_Mg_swap_1.2"           
## [49] "WT.CSH.Ca_Mg_swap_1.3"            "WT.RKS.Ca_Mg_swap_1.1"           
## [51] "WT.RKS.Ca_Mg_swap_1.2"            "WT.RKS.Ca_Mg_swap_1.3"           
## [53] "WT.RKS_mod.Ca_Mg_swap_1.1"        "WT.RKS_mod.Ca_Mg_swap_1.2"       
## [55] "WT.RKS_mod.Ca_Mg_swap_1.3"
nrow(R1[[1]])
## [1] 288
  1. A “melted” or “long” format, that is useful for plotting and combining. Here column names do not depend on sample names. The total length is much longer, specifically 54 times longer = 6 (number of strains) \(\times\) 3 (number of conditions) \(\times\) 3 (number of replicates of each sample).
names(R1[[2]])
## [1] "sample_name" "strain"      "condition"   "plate"       "rep"        
## [6] "time"        "absorbance"
nrow(R1[[2]])
## [1] 15552
  1. A dataframe that just contains the sample data names and conditions.
names(R1[[3]])
## [1] "sample_name" "strain"      "condition"   "plate"       "rep"

(Note that 1 and 2 are typically interconverted, and 3 can be dervied from 1 or 2.)

Initial look at the data

The plot_from_melted_plate_faceted and plot_from_melted_plate functions are both simple functions, meant to be a fast way to look at each plate. (They can take more than 1 plates-worth of data in one data frame, but formatting and customiazability are limited.) The first returns a single plot faceted by whatever your different conditions are, the second returns 1 plot per condition. Both require you to give them the data frame (no path needed this time), a (named) color vector for coloring by strain, a title in string format, and the type of scaling on the y-axis you would like (“straight”, “log2”, “log10”, “natural_log”.)

Color vectors

Unnamed color vectors.

If I had different strains across different plates and didn’t care that colors matched, I could just create an unnamed color vector.
I’m going to use the paletteer package’s palette constructors. The package makes accessing a lot of the preset palettes easier. Auto-fill is your friend here (start typing without quotation marks). It also has other functions aside from palette constructors to change colors when manually making your plot within the plotting call.
Here I’m going to call three colors from a discrete (that’s what the d is for) palette.

sixcolors.noname <- paletteer_d("nationalparkcolors::Saguaro", n = 6)
show_col(sixcolors.noname)

If I had really specific colors that I new the hex code for, I could also enter them as a list of strings into the sixcolors vector.

Named color vectors

I like named color vectors to keep colors consistent across plots. For this, we just need to make a vector of color codes, each entry named to match the levels of whatever variable you’d like to color on. Here we’re matching number of strains

First, find how many colors you’ll need for each. Here I know that the strains/samples I ran on each plate are all the same, so I’m only testing R1.

unique(R1[[2]]$strain)
## [1] "A244T"     "N274Y"     "N445K"     "P421L"     "tRNA-Ile2" "WT"
length(unique(R1[[2]]$strain))
## [1] 6

Spefically this call looks at all of the values under “strain” in the second data frame in the list, and lists the unique values. The length() function counts, which is useful if you have a lot of something.
If we had had different strains on each plate, but wanted to keep the colors consistent, you would need to append each list of strains together and maybe save it as that seems like it would be handy.

otherstrains <- c("P421L", "P421L", "tRNA-Lys", "PPC", "PPC", "tRNA-Lys", "A244T", "WT")
total_strains <- unique(c(R1[[2]]$strain, otherstrains, R2[[2]]$strain))
total_strains
## [1] "A244T"     "N274Y"     "N445K"     "P421L"     "tRNA-Ile2" "WT"       
## [7] "tRNA-Lys"  "PPC"
length(total_strains)
## [1] 8

Now we can use a palette constructor to generate six colors, and use names() to set the sames of each color.

eightcolors <- paletteer_d("vapoRwave::hotlineBling", 8)
show_col(eightcolors)

If I don’t yet care about the ordering of colors, I can use the previous type of call to make assiging names easy, without typing them all out.

names(eightcolors) <- total_strains

If you wish to start assigning names to specific colors, note the order of the colors, then enter the names of each strain in a list in the matching order.

Note that because of the order of the two lists, the cyan and dark blue color will never be shown in the following graphs, because those strains don’t actually show up on the plates.

The quick plots of all data

plot_from_melted_plate_faceted(.data = R1[[2]], colors = eightcolors, title = "Modified RKS, plate 1", yscale = "straight")

This can be hard to see with the scaling, so you might want to be able to page through each condition. Note that the condition plotted is appended to the end of your title.

plot_from_melted_plate(R1[[2]], eightcolors, "Modified RKS, plate 1", "straight")
## [[1]]

## 
## [[2]]

## 
## [[3]]

These sorts of quick plots are good for giving you a general idea of what is going on. Also, that I should cut-off my data around 34 to 36 hours. You should run through all of your plates to see how each replicate did as well. From these, I already know that CSH-plate1 was no good, so that is why there isn’t a C1.

Combining data

The melted format (where all plates have the same column names) makes combining data easy. We will simply attach each data frame to the bottom of the last.

allplates <- bind_rows(R1[[2]], R2[[2]], R3[[2]], C2[[2]], C3[[2]])
nrow(allplates)
## [1] 77760

Again we can also get an idea of what all of the data looks like on the same plot via individual replicates.

plot_from_melted_plate(allplates, sixcolors.noname, "All plates", "straight")
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

You can see that my CSH_mod conditions definitely vary by plate, but overall trends are maintained.

Formatting the data

In order to start getting things ready to plot, and to analyze, we should remove the baseline from the absorbance readings, and order the strains and conditions in a meaningful way. We can do that with format_GC, which will remove the baseline from each well’s readings (definied as the minimum value measured across all times), and allow you to order the strains and conditions in a way that makes sense for you. It also makes both of these variables the “factor” type, which is great for categorical variables, and often required for certain stat analyses.

To start, let’s see how R ordered my strains and conditions:

unique(allplates$strain)
## [1] "A244T"     "N274Y"     "N445K"     "P421L"     "tRNA-Ile2" "WT"
unique(allplates$condition)
## [1] "CSH"     "RKS"     "RKS_mod" "CSH_mod"

This corresponds to the order that R read them in, and the order that each plate’s dataframe was attached. This is fine, but I would like to compare to WT (so list it first), and the conditions are step-wise variations. So, matching the exact names already being used, enter them in lists in the order you wish them to appear. This can be helpful when you are using numbers that are not actually in “order” but rather are names.

df <- format_GC(allplates,
                strainfactors = c("WT", "A244T", "N274Y", "N445K", "P421L", "tRNA-Ile2"),
                conditionfactors = c("RKS", "RKS_mod", "CSH_mod", "CSH"))
unique(df$strain)
## [1] A244T     N274Y     N445K     P421L     tRNA-Ile2 WT       
## Levels: WT < A244T < N274Y < N445K < P421L < tRNA-Ile2
unique(df$condition)
## [1] CSH     RKS     RKS_mod CSH_mod
## Levels: RKS < RKS_mod < CSH_mod < CSH

This function can only handle formatting one condition variable, however if you have more than one…

Mutliple independent condition variables

In this specific experiment, I actually have two condition factors (Ion and Buffer recipe/levels) that I’ve currently been treating as one condition factor. (And will still also use as an example later.) When it comes time for the stat analysis, I’d like to be able to control and compare using these. As might be obvious from the condition labels, each condition value dictates the values of two other factors I’d like to be able to analyze: Buffer and Ions.

The exact breakdown of my setup looks like:

Condition value Buffer Ion formula
RKS RKS RKS
RKS_mod RKS CSH
CSH CSH CSH
CSH_mod CSH RKS

So now I’d like to extract these two variables from the condition variable.

Depending on how the condidtion variable is structured, you will have two different approaches to extracting the levels/values for each variable:

  1. Manual-case-by-case assignment or
  2. Automatic separation of a compound condition.

Manual assimgment.

If the condition names are not easily broken down into sub-variables, you’ll have to assign them case by case.

Because I did not name my conditions with compound conditions, extracting the two different variables from one condition variable will take a little work. The general flow is: for each new variable, examine what is in the original condition varaible at each row, and assign a value on a case-by-case basis. The conditions were named after the buffering components of the media, so I assigned Buffer using a general serach for the two different buffers. The Ions required an individual assigment for each possible value of the condition variable.

To add extra variable columns, we’ll use the following functions:

  • piping %>% to continuously modify the data frame df, by passing it through functions that then “return” a modified df to hand off
  • mutate to create a new column/variable and assign it values at each row
  • case_when to conditionally assign values to this new variable based on the values in other columns/variables for that row
    • There is an item for each case we want to consider, demarcated by a comma
    • The left side of the case (before the tilde ~) is the case we’re checking for. This should evaluate to a boolean (TRUE/FALSE).
    • The right side of the case (after the ~) is the value to put store in the variable at that row. Here we have character strings.
    • R will go through each row of the data frame, and check for each case in order, until it finds a true evaluation. It then assigns the corresponding value to the new variable, and moves to the next row. (No later conditions will be checked.)
  • grepl to find which the values we want. Because of how I named these conditions, I need to search for any instances of “RKS” or “CSH” in the condition variable, regardless of whether they are by themselves, or part of a larger string. (Note, I could just have repeated what I did for the Ions variable up in the Buffer part, and written out each case explicitly, but I let R do the legwork for me.)
  • factor to turn each variable into a factor, with a specific ordering. (This is how the format_GC function works.)
dfExtended <- df %>% mutate(
  Buffer = case_when(grepl("RKS", condition) ~ "RKS",
                     grepl("CSH", condition) ~ "CSH"),
  Ions = case_when(condition == "RKS" ~ "RKS",
                   condition == "RKS_mod" ~ "CSH",
                   condition == "CSH" ~ "CSH",
                   condition == "CSH_mod" ~ "RKS")
  ) %>%
  mutate(
    Buffer = factor(Buffer, levels = c("RKS", "CSH"), ordered = TRUE),
    Ions = factor(Ions, levels = c("RKS", "CSH"), ordered = TRUE)
  )

Automatic separation

Compound naming makes this part super easy. The separate function takes a column (here condition'), looks for the separator on which to split the values (here a hyphen), and then stores them sequentially in the new columns you give it (theinto=parameter with the list of variables). I would let tell it to keep the original column intact (theremove=parameter). The function is smart to expect as many pieces as you give it variables. This example expects 2 hyphpens and three chunks of text for each entry in theconditioncolumn. It will warn you if there are too many or too few pieces. If you know that there will be discrepencies, you can dictact the funcitons behavior withdrop=andfill=arguments respectively (see theseparate` documentation for more help.)

(This part of code is hypothetical)

dfExtended <- df %>% separate(condition, into = c("First Variable", "Second Variable", "Third Variable"),
                              remove = FALSE,
                              sep = "-") %>%
  mutate(
    Buffer = factor(Buffer, levels = c("RKS", "CSH"), ordered = TRUE),
    Ions = factor(Ions, levels = c("RKS", "CSH"), ordered = TRUE)
  )

Because using two condition varialbes is just an extension of using one, I will procede through this example with two variables. (Where typically simplifying down to one means just deleting the second variable.)

Plotting the means

We can also look at quick plot of the means at each time point using mean_OD and bare_meanplot plot. mean_OD just needs the data frame you wish to calculate the mean ODs for, and how you want to group the data. It will return a data frame columns for OD-corrected absorbance, mean, and error stats.
Most often for general growth curve representation, we want to group by time, strain, and condition. The grouping variables need to be entered in a list with quotes around each. There is a row each strain/condition combination. (Here 6 strains \(\times\) 4 conditions \(\times\) 288 time points.)

df.means <- mean_OD(dfExtended, c("time", "strain", "Buffer", "Ions"))
nrow(df.means)
## [1] 6912

Note that none of the raw data is in the data frame.
To plot this, bare_meanplot needs the mean data a (named) color list. It creates an extremely basic ggplot to which you can add faceting, scaling, etc., as it can take a lot of fine-tuning to get a final plot you are happy with. I’m going to store it in a variable, so that I can use it as a base later and add other ggplot layers.

means.plot <- bare_meanplot(df.means, eightcolors)
means.plot

This looks terrible! That’s because we have multiple conditions all mixed together, and ggplot is trying to put them in one line.

To break up by conditions, use facet_wrap or facet_grid. Here I’m also using a ggsave to save a copy of the plot as a pdf (the function will adjust appropriately by you just changing the extension of the file name) with modified dimensions.

testplot <- means.plot + facet_wrap(Buffer + Ions ~ .)
testplot

ggsave(here("testplot.pdf"), testplot, height = 8, width = 6)

Note that the colors stayed consistant with my first plots, and the order changed.

To get fancy, I’d like to trim my x-axis and re-label the faceting variable by both the variable and its value.

means.plot +
  scale_x_continuous(limits = c(10, 40)) +
  facet_wrap( Buffer + Ions ~ . , labeller = label_both)
## Warning: Removed 642 row(s) containing missing values (geom_path).

For single varaible conditions (and vurther examples of labeling)

Returning again to the single varialbe condition, to display names other than what is in the conditions variable, we’ll use a trick similar to the named color vectors, and make a named list of labels. Because we’ll feed them into ggplot’s labelling functions, you can use special notation (with label_wrap_gen) or plotmath notation (with label_parsed) to get italics, sub/superscripts, and symbols. (Google will be your best friend here. As it always is with all coding.) We will use a much messier version later, too.
The list assigned to the names call must match the values in the condition variable. Again, make sure that the order of the lists match each other (they don’t necessarily have to match the display order).

conds.labels <- c("RKS Buffer\nRKS Ions", "RKS Buffer\nCSH Ions", "CSH Buffer\nRKS Ions", "CSH Buffer\nCSH Ions")
names(conds.labels) <- c("RKS", "RKS_mod", "CSH_mod", "CSH")

OK, for the plot (event though the means are the same as before, the condition varaible as not included before, so we’re simplying to make another (almost idential) dataframe):

df.means.single <- mean_OD(dfExtended, c("time", "strain", "condition"))
means.plot.single <- bare_meanplot(df.means.single, eightcolors)
means.plot.single +
  scale_x_continuous(limits = c(10, 40)) +
  facet_wrap(condition~., labeller = as_labeller(conds.labels, label_wrap_gen(width = 12)))
## Warning: Removed 642 row(s) containing missing values (geom_path).

Warnings about removed rows stem from the fact that I have essentially censored the data, i.e. it is confused that there are rows with values that are not being displayed.

More options for ggplot2

For more options and help tweaking the plots, I really like STDHA’s tutorials that pop up in Google search results for “ggplot2 + ______”, like “change axis ticks” or “title”. I tried to keep what I declared minimal, so that ggplot would not get angry about things gettting rewritten; but, if it causes problems, you can modify either a copy the body of the code for the bare_meanplot or the script itself.

Other groupings

Just as an example, let’s say I wanted to seach the averages of each plate (instead of letting them average together.) We’ll use mean_OD again, but with another grouping variable.

df.plateMeans <- mean_OD(dfExtended, c("time", "strain", "Buffer", "Ions", "plate"))

Now we’ll plot in a grid, with the conditions as columns, and plates as rows. (Because the facet labels don’t include values for the plates, I had to modfiy the labeller call to specifically say, “Only apply this function to the condition variables.” If I could either append more values onto the named labels, or create another named list of labels and add a second item in that call labeller = labeller(Ions = ..., plate = ...))

bare_meanplot(df.plateMeans, eightcolors) +
  scale_x_continuous(limits = c(10, 40)) +
  scale_y_continuous(limits = c(0, 1.3)) +
  facet_grid(plate ~ Buffer + Ions, labeller = labeller(Ions = label_both, Buffer = label_both))
## Warning: Removed 642 row(s) containing missing values (geom_path).

More plot options

Extracting Measurements

This part relies heavily on the growthrates package, with a helpful tutorial here. Explanations of their different functions and the models they use.

All of them will need a data frame with base-line removed absorbance data, but no negative values.

To begin, growth rates gives you two groups of options for models to fit with different pros and cons: simpler “easy linear” and “smoothing splines” or the more complex parametric non-linear models.

Easy linear and smoothing splines

The easiest to fit, but which give the least info are the “easy linear” and “simple splines”. They require minimal input and take the shortest amount of time to run (although than can be several minutes, depending on the number of plates-worth of data that you are analyzing.) The only information you get from these are v-max (maximum early log-linear growth rate, written as mumax for the Greek letter mu, shorthand for growth rate), lag, and the y-intercept (which we know should be around 0).

The function itself returns the fit and not the parameters or a table, which can take a little getting used to.

rates.easy <- all_easylinear(OD ~ time | sample_name,
                                  data = dfExtended, grouping = strain, spar = 0.5, time = "time", y = "OD")

You can see each of the fits. This gets its own line because it’s large.

plot(rates.easy)

The coefficients themselves can be extracted from the fits with coef, though for some reason it needs to be coaxed into being a dataframe.

rates.easy.coef <- as.data.frame(coef(rates.easy)) %>% rownames_to_column(var = "sample_name")

Parametric non-linear

These curves fit the full logistic growth model to each curve, meaning that they return carrying capacity in addition to v-max and lag.

Analysis

General anova first.

Across all samples, within a condition. (For all conditions.)

Across all conditions, within a sample. (For all samples.)

Plotting